Voice pitch determination

ABSTRACT

796,677. Vocoder systems. WESTERN ELECTRIC CO., Inc. Oct. 7, 1955 [Oct. 20, 1954], No. 28214/57. Divided out of 796,676. Drawings to Specification. Class 40 (4). [Also in Group XIV (c)] The fundamental period of a complex waveform (e.g. speech) is obtained by applying the signal to a tapped delay line and comparing the undelayed wave with the waves obtained from the tappings and determining the delay for which the output waveform most nearly resembles that of the undelayed wave. The description is identical with part of Specification 796,676. Specification 466,327 also is referred to.

Oct. 13, 1959- s. RAlsBEcK voIcE FITCH DETERMINATION 3 Sheets-Sheet 1Filed Oct. 20, 1954 INVENTOR By G. RA/SBECK ATTORNEY .lIwM iilllmlu. ill

O'ct. 13, 1959 G. RAlsBEcK 2,908,761

voIcE PITcH DETERMINATION iam-7C.

A TTORNEV t 13, 1959 G. RAlsBEcK 2,908,761

voIcE FITCH DETERMINATION Filed Oct. 20, 1954 3 Sheets-Sheet 3 F/c. 3 lE# '-Lllllllllh-o E" (T) I vvv/kfvvfxmv /NVENTQR 6. RA /SBE C K ATron/var Patented Oct. 13, 195.9`

2,908,761 voIcE Prrcn DErERMlNArIoN Gordon Raisbeck, Bernards Township,Somerset County,

NJ., assignor to Bell Telephone Laboratories, Incorporated, New York,N.Y., a corporation of New York Application October 20, 1954, Serial No.463,467 16 Claims. (Cl. 179-1555.)

This invention relates to the transmission of speech over narrow bandmedia by vocoder techniques. Its principal object is to improve theaccuracy and naturalness of the reproduced speech.

In the vocoder transmission system of Dudley Patent 2,151,091, an inputspeech wave is analyzed to determine Y its fundamental frequency orpitch and the distribution of amplitudes among a number of frequencysub-bands into which the speech-frequency range is divided. 'Ihe resultof this analysis is translated into a number of control currents each ofwhich, over and above the first, represents the speech energy in onesub-band, while the first control current represents its fundamentalfrequency or pitch. These control currents are transmitted to asynthesizer and are there utilized to build up, from sources of energyin the synthesizer, an artificial speech wave having the characteristicpitch and amplitude-frequency distribution of the original impressedspeech. More particularly, the synthesizer apparatus includes a sourceof periodic energy, commonly known as a buzz source, for reproducingvoiced sounds, and a source of aperiodic energy, commonly known as ahiss source, for reproducing unvoiced sounds, and the outputs of thesesources are applied, in the alternative, to the several members of abank of lters, each of which embraces one of the subbands into which thevoice frequency range is divided.`

The sub-band control currents derived by the analyzing apparatus controlthe gain or attenuation in these several sub-band paths. The choicebetween the buzz and the hiss source is controlled by one characteristicof the pitch control current while the fundamental frequency of the buzzsource is controlled by another characteristic of the pitch controlcurrent. In other words, the pitch control current tunes the buzz sourceto thefundamental frequency of the original voice, turns the buzz sourceon for a voiced sound, indicated by a strong pitch control current, andturns the buzz source off and the hiss source on for an unvoiced soundindicated by a weak pitch control current.

With such apparatus every sound of normal speech is, in the synthesizerapparatus, classed either as a voiced sound or as an unvoiced sound.While such classification is adequate in some cases there are many othercases in which it is inadequate. For example, the sound of the letter Zin the normal English or American pronunciation of the word azurepartakes equally of the nature of a voiced'sound andY of an unvoicedsound; so that any classification of such a sound either as voiced or asunvoiced but not both is arbitrary and artificial. Heretofore thisartificial classification of all sounds into one or the other of twomutually exclusive categories has been an unfortunate necessity imposedby the lack` of any measure of the relative amounts of the originalspeech energy of the two kinds, periodic or aperiodic.

Another defect of present vocoder systems is that the derivation at theanalyzer station of a reliable pitch control current has alwayspresented a difficult' problem to the engineer.` Many voices are so richin harmonic' comtherefore ditlicult to segregate.

2 Y ponents that the energy of the fundamental component is small incomparison with the harmonic energy, and is Under some conditions theenergy at the fundamental frequency disappears entirely and resort mustbe had to indirect measures, such as the intermodulation of adjacentharmonic components, to derive a difference frequency. Aside from theapparatus complexities entailed, such difference frequency is a truemeasure of the voice pitch only in the case of a steady sound, whilevariations of frequency and of phase of the intermodulated components inthe course of inection causes such instantaneous frequency to be whollyinadequate.

The present invention approaches the problem of pitch determination inthe time domain instead of the fre,

quency domain; i.e., it seizes hold of the fundamental period of theVoice instead of its fundamental frequency, and tracks it; i.e.,continues to hold it, as it changes. It is characteristic of a periodicwave, no matter how complex that, after a certain time interval known asthev period its form is a repetition of what has gone before. In thecase of an exactly periodic wave the repetition is exact.

frequencies of interestthat every voiced speech wave is periodic ornearly periodic) the repetition is inexact and approximate, butnevertheless easily recognizable. Such` repetition or near repetition ofthe waveform in successive periods holds good quite aside from theexistence o physical energy at the fundamental frequency.

The invention turns these considerations to account by dividing theVoice wave into two paths, delaying the energy in one with respect tothat in the other by a controllable amount, comparing the delayed wavewith the undelayed wave,;varying the amount of delay until a best matchis obtained, and noting the corresponding amount of delay, with therecognition that this amount of delay is, identically, the fundamentalperiod. A period control current may then be derived which would serveas well as a pitch control current, the period and the pitch beingreciprocals of each other. It is preferred, however, in order thatpresently knownbuzz source frequency control apparatus may be employedwithout change, to derive in the first instancea control current whichis reciprocally related to the observed fundamental period of the voice;i.e., it is directly proportional to the voice pitch. This indirectlyderived pitch control currentl is indistinguishable from the pitchcontrol current of the prior art except in respect to its greaterreliability, and` may be employed in the presently known fashion.

There are in general two physical causes for inexactness of the matchbetween the waveform of any funda'- mental period and that of the priorperiod. The first cause has to do with the fact that the length of theperiod changes from period to period in the course of inec-y tion.Inflection rates in voiced speech sounds are so slow, comparedwith theirfundamental frequencies, that this cause is responsible for only anegligible departure from exactness of match, provided only that theapparatus is capable of holding and tracking the fundamental periodthrough its changes.

The second cause has to do with the fact that noA tion, therefore, andin addition to determining the delay In the case of a nearly periodicwave (and, syllabic rates in speech are so slow compared with voiceforwhich the best match is obtained, there is also determined the degree towhich this best match fails of perfection. From this determination anovel control current is derived. It is transmitted to the synthesizerstation where it operates to control the lrelative =amounts of energysupplied, respectively, by the buzz source and the hiss source to thefilter bank of the spectrum synthesizer apparatus. Thus the switchingoperation as between these two sources, which characterizes the priorart, is completely dispensed with. The buzz source and the hiss source`remain connected to the filter bank `at all times Vand feed the filtersof the bank with their respective energies in the proper yamounts undercontrol of this novel control current and therefore in accordance withthe distribution of energy in the original speech as between itsperiodic Vcomponent and its aperiodic component.

-It is well known that a quantitative measure of the degree to which anywave matches any other wave is found in the cross-correlation functionof these two waves. When one of the waves in question is a delayedreplica of the other, the function in question is known as theauto-correlation function. The present system employs auto-correlationtechniques for the determination of the fundamental period.

The invention will be fully apprehended from the following detaileddescription of a preferred illustrative embodiment thereof taken inconnection with the appended drawings in which:

- Fig. 1 is a block schematic diagram showing a vocoder transmissionsystem embodying the invention;

Fig. 2 is a block schematic diagram showing apparatus for deriving aspeech period controll current and an energy distribution controlcurrent;

Fig. 3 is a schematic circuit diagram showing a divider for use in thecombination of Fig. 2; and

Fig. 4 is a set of curves which are referred'to in the explanation ofthe invention.

Referring now to the drawings and in particular to Fig. 1, speechcurrents, which may originate in a telephone instrument 1 are firstpassed' through a unit which acts to hold them to an approximatelyconstant level suitable for the operations of the remainder of theapparatus. It may be a voice-operated gain adjusting device 2 nowcommonly known as a Vogad The output of the vogad is applied in parallelto apparatus shown in the upper part of the figure which derives thenovel period control current andv the novel energy distribution cturent,and to conventional spectrum analyzer apparatus shown in a broken linebox 3 in the lower part of the figure. These energy paths, like allothers subsequently to be described, are shown by single lines in thedrawing merely in order to avoid complexity. It will be obvious to thoseskilled in the art at what points wire pairs or other complete circuitsmay be required in practice.

The novel apparatus shown in the upper left-hand part of the figure isdescribed below in connection with Fig. 2 which shows its details. Forthe present it suffices to note that it comprises an autocorrelatorhaving a single input point and a number of output points 6-1 to 6-20, amaximum value selector 7 having a number of 2input points 6-2 to 6--20Vand two output points 8, 9, and a divider 10 having two input points 9,11 and a single output point 12. The number of output points 6 of theautocorrelator 4 exceeds by one the number of input points of themaximum value selector 7, and each of these, other than the first, isconnected to one such input point. The rst output point 6-41 of theautocorrelator 4 is connected directly to one input point 11 of thedivider 10 while the other input point of the divider 10 is furnished byone of the two output points 9 of the maximum value seelctor 7. Theremainlng output point 8 of the maximum value yselector 7 4 carries `asignal which may be utilized at the receiver station without furtherchange.

The autocorrelator 4 measures the autocorrelation gI/(r) of the inputsignal for various discretely different amounts of relay T, includingzero, and delivers on each of its output points 6 a signal which isproportional to the autocorrelation for one such delay value. Themaximum value selector 7 picks out the greatest among these, tl/(Tm)from among all the others and supplies it as one input to the divider`10. At the same time the maximum value selector 7 acts to tag oridentify the delay value rm for which this greatest autocorrelation wasobtained and to deliver an `additional identifying signal for usewithout further change at the synthesizer station. As a matter ofconvenience the identifying signal is made inversely proportional to theidentified delay and therefore directly proportional to its reciprocal.Because the delay for which the autocorrelation is greatest is equal tothe fundamental period of the speech, its reciprocal, 1/ Tm, is equal tothe pitch frequency.

The conventional spectrum analyzer 3 shown in the broken box in thelower part of Fig. 1 comprises a bank of bandpass filters 15 connectedin parallel, each of which is proportioned to pass a preassignedsub-band of the voice frequency band of interest, while together theypass the entire band. For the sake of illustration, ten such filters areindicated, the first two and the last being shown. Each such filter 15is followed by a detector 16 which in turn is followed by a low-passfilter i7. The control current output of each of these several low-passfilters 17 is thus a measure of the voice energy in that sub-band towhich such low-pass filter is connected.

These spectrum control currents are transmitted by any desired means,indicated by conductors 18, 19, to a receiver station which comprisesconventional synthesizing apparatus shown in the broken line box 20 atthe lower right-hand part of the figure. This synthesizer comprises anumber of filters 21 having their output terminals connected in parallelto a reproducer 22. These several filters are proportioned to exhibittransmission characteristics like those of the several analyzing filters15. Shaping networks 23 precede the several filters 21. Each shapingnetwork 23 is Supplied, by way of a conductor 24, with locally generatedenergy from a hiss source 25 and a buzz source 26, while itstransmission is modulated by the control current derived at the analyzerstation by a corresponding one of the lters 15, 17 and transmitted overthe intervening channels 18, 19. Aside from the fact that the locallygenerated energy supplied to the several shaping networks 23 iscontinuously a mixture of the outputs of a buzz source 26 and a hisssource 25 in varying proportions, this spectrum reconstruction apparatusis conventional.

Turning now to Fig. 2, the autocorrelator 4 itself is a variant of oneshown in Bennett et al. Patent 2,676,206. It comprises a delay device 30such as an electromagnetic transmission line. example, forty similarsections of series inductance and shunt capacitances having twentyevenly spaced taps 31; i.e., one located at every other section. Theline may be terminated in well known fashion for no reflection by aresistive load 32. In terms of propagation time along the transmissionline, the spacing between each tap 31 and the next one may be 500microseconds, the total delay for all 40 sections being thus 10,000microseconds or 20 milliseconds.

It is well known that the autocorrelation of a wave for any particularvalue of delay T is obtainable by multiplying the delayedv wave by theundelayed wave and integrating the product. To this end each of theoutput taps31 of the delay line 30 is connected, by way of a bufferamplifier 33, to one input point of a multiplier 34, while the-undelayedwave is supplied from the input terminal 5 of the line tothe remaininginput The latter may comprise, forV points of all ofV these multipliers34 in parallel. The products of the delayed signal by the undelayedsignal thus formed by each 'of these multipliers for each particularvalue of delay T, as represented by the locations of the several tapsV31 along the transmission line 30, is now averaged in the time domain byan integrator 35. The output of each such integrator is thus theautocorrelation MT) of the signal for a particular delay T. In general,the outputs of the several integrators 35 differ fromeach other inmagnitude, some one being greater than the others. 'This greatest signalis a measure of the best match between the undelayed signal and one ofthe delayedv signals, and the location along the transmission line30y ofthe tap 31 on which it appears is a measure'of the corresponding delayTm. y

l These several Aautocorrelation outputs MT1), MTZ), MTS), and so forth,other than the rst one MTU), are connected to the several inputterminals of a maximum value selector 7. The lattermay bel a variant ofone shown in Davis etal. Patent 2,646,465, and may comprise a pluralityof amplifiers, eg., Vacuum tubes 40 of which all the cathodes areconnected together and by wayof a load resistor'41 to ground, whiletheir anodes are connected by way of individual relay windings 42 to thepositive terminal of a source 43 whose negative terminal is grounded.The several input terminals of the selector 7 are provided by thecontrol grids of these ampliers 40, which must of course be adjusted intheir steady potentials by appropriate bias circuits not shown. Eachrelay 42 is provided with two pairs of contacts 44, 45, shown in theiropenpositions. Inveach case the inner contact pairv44 acts, whenpulled'up, to apply the input signal ,directly to a rst common bus V46,While the second contact pair 45 acts, when pulled up, to applythepotential of an individual battery 47 to a second common bus 48. Thepotential of each of the several batteries 47 'differs from that ofevery other battery 47, and these potentials are selected in accordancewith a pattern't'o be described subsequently. f f

In operation, let it be assumed thatl the match of `the undelayed signalAwith the signal delayed by transmis-y sion over "the line to the thirdtap 31-2 is more perfect than the match of the undelayed signal withthedelayed signal as it appears at any other tap 31.` In this case thesignal applied to Vthe grid of the second tube 40%2 of the selector 7exceeds in magnitude the signal applied to the grid of any other tube 40of the array. While conduction maywell start inothers of these tubes,the greater anode current ofthe.` second tube, flowing to ground by wayof the cathode resistor 41, establishes a voltage drop across thisresistor sufcient to raise the potentials ofthe cathodes of all of thesetubes 40 to levels Asuch that the conduction -of all tubes other thanthe second is negligible. Since the second tube 40-2 is' the only onewhich conducts in a significant amount, th'e contacts of the secondrelay 42-'2 are pulled up and no` others. By the closing of the innercontacts 44-2 a signal proportional to the autocorrelation MTE) for thesecond delayvalue T2 is applied directly to the first common bus 46. fBy the closing of the outer contacts 45-2 the potential of the secondbattery 47-2 is applied directly to the second common bus 48.

Thus, under each of n different possible conditions, where n is thenumber of different values of delay T which the autocorrelator 4provides, only one of the tubes 40 conducts significantly, and one andonly one relay contact pair 44, 45 is closed, thus placing on the rstcommonbus 46 the magnitude of the autocorrelation MT) for a particulardelay and, on the second common bus 48,the potential of a particularbattery 47 which is uniquely associated with that delay. Furthermore theautocorrelation MT) and the corresponding delay T thus selected are themaximum autocorrelation, MTm) and tbe corresponding delay Tm. Y Y

6 t VIn order that conventional receiver apparatus may be employed witha minimum of change, the potentials of the several batteries 47 areselected in inverse proportion to the several values of delay T withwhich they are associated. This is indicated on the output lead of thesecond commonY bus 48 by a legend showing that the signal thereon is thereciprocal of that delay Tml for which the autocorrelation is a maximum.From what has been stated above this is evidently proportional to thefundamental frequency or pitch of the speech.

By employing a very large number of delay line taps 31 and acorrespondingly large number of recognizing circuits in the maximumvalue selector 7 the variations of this signal may be made substantiallycontinuous. With a smaller number, e.g., ytwenty of such circuits ascontemplated, .this output signal naturally contains somewhat abrupttransitions. These, however, may readily be eliminated by theinterposition of a low-pass filter 49.

Similarly the changes, as operation proceeds, from recognition of oneautocorrelation signal as the greatest .to recognition of another one asthe greatest, is reected in abrupt transitions in the magnitude of thesignal on the first common bus 46. As before, these transitions may inprinciple be reduced as far as desired by the employment ofa' largenumber of delay line taps 31 in the autocorrelator 4 and acorrespondingly large number of recognizing circuits in the selector. Inpractice, however, it is suicient to interpose a low-pass filter 50.`

The filtered signal from the iirst common bus 46 is applied to one ofthe two input points 9 of a divider 10 while the autocorrelation MTD) ofthe speech wave for zero delay is applied to the other input point 11 ofthis divider. The divider 10 itself may be any unit which forms on itsoutput lead 12 the quotient of a signal ap Y plied to its upper inputlead 9 divided by another signal applied to its lower input lead 11.Various constructions are possible, a suitable one being shown in Fig. 3which carries out a logarithmic division operation by electronic means.

That is to say it carries out the following operation: K' (l) It takesthe logarithm of the signal applied to its upper input lead;

v(2) It takes the logarithm of the signal applied to its lower inputlead;

(3) It subtracts thesecond result from the first result; and, finally,

@il t=to ekT-i where Vis the applied voltage,

e is the base of Naperian logarithms,

Tis the absolute temperature,

lc is Bolzmanns constant,

q is the electronic charge,

z'ois'a constant; i.e., theY current which ows in the absence of anyapplied voltage V.; I

From this it follows that the voltage is proportional to the logarithmof the current; i.e

, V. 10 *r 1 l g 'La 2) These relations, known for some years to holdexactly 1nl principle, have more recently been found to hold to '7 aAvery good approximation in practice, especially with rectiiiers of theP-N junction variety. See, for example, a note by Goucher, Pearson,Sparks, Teal and Shockley, published in the Physical Review for 1951,volume 81, No. 4, page 637.

These properties of P-N junction rectifier diodes are turned to accountin the construction of the divider circuit of Fig. 3 which comprises twovacuum tube triodes 51, 52, the anode of each being connected to thepositive terminal E++ of an operating potential source 53 and thecathode of each being connected by way of a resistor R" to its negativeterminal E- These resistors R are of sufficient magnitude, as comparedwith the internal resistances of the tubes 51, S2, to providesubstantial cathode follower action. One input signal V1, which is toserve as the dividend appears at a first input terminal 54 and isapplied by way of a resistor of magnitude R to the control grid of theleft-hand tube 51. A second input signal V2 which is to serve as thedivisor appears at another input terminal and is applied by way ofanother resistor, also of magnitude R, to the control grid of theright-hand tube 52. Each of these control grids is returned, by way of aP-N junction rectifier diode 56, 57 to a suitable point of intermediatepotential of the source 53, here shown as ground. With these connectionscathode current flows to each of the tubes S1, 52 in an amount suicientto hold its control grid, in the absence of an input signal, at cutoff,while the input rectiers 56, 57 are operated in their forwarddirections, so that their resistances are small compared with theresistors R to which they are connected. A third P-N junction rectifierdiode S8 interconnects the two cathodes, and an output signal is derivedacross a resistor R in series with the anode of the right-hand tube 52;i.e., between the righthand tube anode and ground.

The input diode current is in each case, therefore, proportional to theinput signal voltage; i.e.,

Hence, from Equation 2 the grid voltages applied to the two tubes 51, 52are given by V J9 a 3 g 10g o (5) and E it [/4- q 1 0, (6)

where i and i0 are the currents which flow in the first diode 56 and inthe second diode 57, respectively, when large reverse voltages areapplied.

By virtue of the cathode follower action the cathode potentials are inturn given by From elementary circuit considerations, the voltage dropacross the third diode is equal to Vs-Ve From Equation 1 the currentthrough the third diode 58 is an exponential function of the voltagedrop across it; i.e.,

1 -(V5V saw a 9) where i0 is the current which flows in the third diode58 in the absence of an applied voltage. But when Equation is subtractedfrom Equation 7 as required by the exponent in Equation 9 and thelogarithm of both sides is taken, there results iol/7:0! il ,iol/iol V1i0 t." to V2 (lo) that is to say, the current through the third diode S8is proportional to the quotient of the dividend signal by the divisorsignal.

It remains to extract an output proportional to this third diode currentwithout upsetting the operation `of the circuit. This may be done by theinclusion of the load resistor in series with the anode of theright-hand tube 52. Its magnitude R is preferably substantially smallerthan that of the other resistors discussed heretofore. With thisarrangement the voltage V8 is evidently equal to the positive potentialof the operating source 53, reduced by the voltage drop across this loadresistor; i.e.,

From elementary circuit considerations, taking the magnitudes ofcurrents and voltages as indicated on the draw ing into account, this inturn is equal to In Equation 12 the first and second terms are constantsand may be eliminated by standard well known means, while the third termis directly proportional to the desired quotient.

Hence, when the maximum autocorrelation signal 41h-m) is applied to theleft-hand input terminal 54 of Fig. 3 (the upper or dividend terminal ofthe divider 10 of Figs. l and 2) and the correlation signal \//(r0) forzero delay is applied to the right-hand input terminal 55 of Fig. 3 (thelower or divisor terminal of the divider 10 of Figs. 1 and 2), therequired quotient appears at the output terminal 59 of the divider (theterminal 12 of Figs. 1 and 2), namely the ratio of the maximumcorrelation Mfm) to the zero delay correlation MTU).

The significance of these relations will readily be understood from aconsideration of Fig. 4, wherein the curve A shows a representativecorrelation curve, normalized to unity value at zero delay, for a fullyperiodic signal of a certain average amount of complexity. This fullyperiodic signal may be termed f1(t) and its autocorrelation, curve A,may be termed gol(1). Curve A shows four maxima of equal heights locatedat delays of o, r, 2f, and 3f. The time function represented thusextends over at least three full periods, each one being exactly likethe last. The fact that the curve A has the value unity for zero delayexpresses the truism that the time function f1(t) is exactly like itselfin the absence of delay. The fact that maxima occur in theautocorrelation curve A at intervals of r reflects the fact that eachfull period of the time function f1(t) resembles its predecessor andthat the delayed signal finds its best match with the undelayed signalfor discrete values of the delay equal to T, 2f, 31- and so on. The factthat these successive maxima rise to the same height as the maximum atzero delay reflects the fact that the successive periods of the timefunction f1(t) are exactly alike, so that for these values of delay thematch is perfect.

Curve B of Fig. 4 shows the normalized autocorrelation p2(1) of anaperiodic signal f2(t), such as noise. It, too, has the value unity forzero delay, which reflects the truism that even a noise signal is anexact replica the r'ist part being a periodic function f1(t) and-'thesecond being an aperiodic function f2(t).

`While it is not exactly true that the autocorrelation p3(t)- of theYsum of two signals is equal to the sum of Vthe autocorrelations of thesignals individually, it is sufficiently nearly true to permit a graphicrepresentation such as that of curve C of Fig. 4, which shows theautocorrelation 03(7) of the time function or sum signal f3(t) `Themathematical formulation of the unnormalized autocorrelation of a timefunction f1(t) is well known andis stated in Bennett et al. Patent2,676,206 and in Lee et. al. Patent 2,643,819, as well as elsewhere. Forthe purposes of these patents and in other circumstances in-which'comparison of autocorrelations is not needed, normalization isunnecessary. In the present situation, on the other hand, normalizationof the autocorrelation to thevalue unity at zer'o' delay is ofassistancein the actualization of further operations of the apparatus.

The normalized autocorrelation of a time function f(t) Ais given bytheexpression Y r Y Lang Tffmfewdt j o Ltm fuman 1mi. i "T (13),':r'Itzisrevidently independent of themagnitude ,of f(t),-beingdependent only on its form.

When for the general time function f( t) there is substituted in`Equation 13 the specic timefunction of in- 'tereslt, namely whereindicates the rms value and p12 is the crosscor- Vrelation of f1 and f2.

To a good approximation,

` i Jlfz 2= J1 2I J2 2 (1,6)` forA theV reason that f1 is purelyperiodic and f2 ispure noise. rAlso, under these circumstances :p12(1)lis-small. Hence, to a good approximation,

i" f1 2 I f2 2 t iti-"1f f1 2+ f2 2f1wT f1 2+ f2 2f22 .-From ,theforegoing it is plain that, taking the `undelayed; autocorrelation ashaving the value unity, the successive maximaof curve C rise to or veryclose to a height above the axis which is proportional to the ratio ofthe'periodic energy in the speech to its entire energy. Looked at in'another way, these maxima fall below the value unity by a distancewhich is proportional to the ratio of the aperiodic energy of the speechto its entire -c iiergy. Hence a measure of the amplitudes of thesemaxima, as compared with the amplitude of the autocorrelation curve forzero'delay constitutes a measure of the relative proportionsof buzzenergy and hissenergy, re spectively, required for proper and realisticsynthesis of artificial speech. This justifies the transmission of theoutput of the divider V10, derived as aforesaid, to the receiver stationwithout further change.

IReturning now to Fig. l, the pitch control signal 1/ rm is applied tothe buzz source 26 to adjust its oscillation frequency in well knownfashion and as shown, for example; in Reisz Patent 2,522,539. The'energyratio signal,--nam:lely, the divider output signal as itY appears onthel divider output terminal 12, is likewise transmitted 1'0 to thereceiver station where it is employed to control the relative amounts ofbuzz and hiss furnished to the spec'- trum reconstructor inthe followingfashion.

Theoutput of the buzz source 26 is delivered by way of a variable gainamplifier 61 to a combination point 62. This amplifier 61 may be of awell known construction such as to furnish a gain proportional to themagnitude of a gain control signal applied to its control terminal 63.The incoming energy distribution signal, termed g(t) for short, ispassed through a. rooter 64 and applied to this control terminal 63.Hence the buzz component of the energy applied to the combination point62 is proportional to the square root of g, and so to the amplitudes ofthe periodic components of the speech.

The signal gft) is also passed through a phase inverter 65 whichconverts it to --g( t) and then through a steady potential source 66ofone volt connected in series. Thus the signal on the lead 67 beyondthe source vis equal to I1''-g(t). This in turn is passed through arooter 68 and applied to the gain control terminal 69 of a secondvariable gain amplifier 70 which is connected in tandem with the hisssource 25. At this point the gain control signal has the form f A ThustheV energy which ,appears ,at the combination point 62 comprises anadditive mixture of periodic energy from the buzz source 26and aperiodicenergy from `the hiss source 25 in proportions as called for by theenergy distribution of the voice and as controlled by the output of thedivider 10. The periodic energy of the buzz source 26 is tuned to therequired pitch frequency as stated above by the rst of the two outputsofthe maximum value selector 7. vThis additive combination is now fed tothe Shaping Networks 23 in parallel, where the slowly varying spectrumcontrol currents operate to control the magnitudes` of the severalfrequency sub-bands which collectively constitute the synthesizedsignal. l i

Becauseof the nature of the operation of the divider 10 its output isunchanged when its two inputs are increased or reduced in the sameratio. To this extent it supplies an output which is a measure of therelative magnitudes of its two inputs and is otherwise independent ofYsuch magnitudes. In `other words it carries out anormalizing functionAin addition to its assigned dividing function. Provided suchnormalization as between the autocorrelation signal for zero delayand`the maximum autocorrelation signal be otherwise obtained, the energydistribution signal g(t) or a near equivalent thereof can readily bederived by a subtraction process.

What is claimed is: f

1. In a system for artificial production of speech, the combinationwhich comprises a speech analyzer station having means 4for derivingfrom a speech sound a pitch control signal, an energy distributionsignal which is representative, for each sound, of its.. proportionalcontent of periodic energy, and a plurality of spectrum control signals,and a reproducer station having` a buzz source, a hiss source and aspectrum synthesizer, means for transmitting all of said'control signalsto said reproducer station, means for tuning the buzz sourceV undercontrol of the pitch control signal, means controlled by said energydistribution signal for mixing the outputs of said sources inproportions determined by the magnitude of said energy distributionsignal, means for applying said mixed outputs to said spectrumsynthesizer, and means for controlling said spectrum synthesizer undercontrol of said spectrum control signals.

v2. In a system'for deriving control signals to control the artificialproduction of speech, means for analyzing .a speech sound, means forderiving from said analysis a pitch control signal representative of thefundamental frequency of said speech sound, means for deriving from said`analysis a plurality of spectrum control signals, each representativeof the speech energy falling within one of .a plurality of frequencysub-bands; which collectively embrace the frequency band of said speechsound, means for -derivingfrom said' analysis a first measure of theentire energy of said speech sound, means Ifor deriving from. saidanalysis a second measure of the energy of the periodic components ofsaid speech sound, and means for deriving from said two measures anadditional signal representative of the distribution of the energy ofsaid speech sound as between its periodic components and its aperiodiccomponents.

3. Apparatus as defined in claim l wherein said means for deriving saidpitch control signal comprises an autocorrelator and a maximum valueselector.

4. Apparatus as defined in claim l wherein said means -for deriving saidpitch control signal comprises means for determining the autocorrelationof said speech sound for various delays, and means for selecting fromamong Said various delays that one for which said autocorrelation asdetermined is a maximum.

5. In combination with apparatus as defined in claim 4, means forderiving a signal which is substantially inversely proportional to saidselected delay.

6. In a system for deriving control signals to control the artificialproduction of speech, means for analyzing' a speech wave, means forderiving from` said analysis a pitch control signal representative ofthe fundamental frequency of said speech wave,'means for deriving fromsaid vanalysis a plurality-'of spectrum control signals representativeof the speech energy falling within a plurality of frequency sub-bandswhich collectively embrace the fre- 'quency band of sai-d speech wave,means for deriving from said analysis a first` measure of the entireenergy of said speech wave, means for Vderiving from said analysis asecond measure of the energy of the periodic components of said speechwave, and means for deriving from said two measures an additional signalrepresentative of the distribution of the energy of said speech Wave asbetween its periodic components and its aperiodic components, said pitchcontrol signal deriving means comprising means for delaying said wave byeach of a plurality of different time lags distributed over a rangeextending substantially from zero to the longest period of said "speechwave, means for individually comparing each delayed wave singly with theoriginal undelayed Wave, tmeans for identifying that one of saiddifferent time'lags for which the delayed Wave, most nearly resemblesthe undelayed wave as determined by said comparison, means for derivingan auxiliary signal proportional to the duration of said identified timelag, and means vfor reciprocating said auxiliary signal to provide saidpitch control signal.

7. Apparatus for deriving a desired control signal indicative of thefundamental frequency of a complex signal wave which comprises means fordelaying said Wave by each of a plurality of different time lagsdistributed over a range extending substantially from zero to thelongest period of said complex signal Wave, means for individuallycomparing each delayed wave singly with the original undelayed wave,means for identifying that one of said different time lags for which thedelayed wave most nearly resembles the undelayed wave as determined bysaid comparison, means for deriving an auxiliary signal proportional tothe duration of said identified time lag, and means for reciprocatingsaid auxiliary signal to provide said desired control signal.

8. In a system for artificial production of speech sounds, a tunablesource of periodic energy, a source of aperiodic energy, means fortuning said periodic energy source under control of a pitch controlsignal, means controlled by an energy distribution signal which isrepresentative, for each sound to be reproduced, of its proportionalcontent of periodic energy for mixing the outputs of said two sources inproportions determined by the t2 magnitude of said energy distributionsignal, a plurality of filters having passbands contiguously located onthe frequency scale and together embracing the frequency band of saidspeech sound, means for applying said mixed outputs to all of saidfilters, means for variably attenuating the energy path of each of saidfilters under control of a spectrum control signal, and means forreproducing the outputs of all of said lters as an artificial sound.

9. Apparatus for selecting from among a plurality of signals that onewhich has the greatest value which comprises a plurality of dischargedevices each having an anode, a cathode and a control electrode, a firstconnection extending from all of said anodes to one terminal of anoperating potential source, a second connection extending from all ofsaid cathodes to one terminal of a common impedance element, a thirdconnection extending from the other terminal of said common impedanceelement to the other terminal of said operating potential source, a likeplurality of relays each having a winding connected in the anode circuitof one of said discharge devices, each of said relays being providedwith two contact pairs, means for applying said input signalsindividually to the control electrodes of said discharge devices, afirst common bus, individual energy paths extending from said rst commonbus through the first contact pair of each relay to the controlelectrode of the device in whose anode circuit said relay is connected,a second common bus, a like plurality of auxiliary potential sources, ofsuccessively greater potentials, associated respectively with theseveral relays, and an energy path extending from said second common busthrough the second contact pair of each relay to its associatedauxiliary source.

10. Apparatus having a first input point, a second input point and anoutput point for delivering at said output point a signal proportionalto the quotient of a dividend signal by a divisor signal of which thedividend signal is applied to said rst input point while the divisorsignal is applied to said second input point which comprises a firstelement having an exponential current-voltage characteristic connectedto the first input point, a second element having a similar exponentialcurrent-voltage characteristic connected to the second input point,means for applying said dividend and divisor signals as currents to saidrst and second input points, respectively, thereby to produce a firstvoltage which is proportional to the logarithm of said first inputsignal and a second voltage which is proportional to the logarithm ofsaid second input signal, means for subtracting said second voltage fromsaid first voltage, and a third element having a similar exponentialcurrent-voltage characteristic for deriving from the difference of saidfirst and second voltages a current which is exponentially related tosaid voltage difference and hence directly related to the quotient ofsaid dividend signal by said divisor signal. p Y

1l. In combination with a source of a speech wave, apparatus forcontinuously determining the continuously varying fundamental period ofsaid` speech wave which comprises means for delaying said wave by eachof a plurality of different time lags distributed over a range extendingsubstantially from zero to the longest period of said speech wave, meansfor individually comparing each delayed wave' singly with the originalundelayed Wave, means for identifying that one of said different timelags for which the delayed wave -most closely resembles the undelayedwave as determined by said comparison, means for rejecting all others ofsaid time lags, means for continually altering the time lag identifiedto preserve said closest resemblance as said fundamental period changes,and means for developing a signal continuously representative of saidvarying identified time lag. v

12. Apparatus as defined in claim 11 wherein said comparing meanscomprises means for determining the autocorrelation of said speech wavefor various delays.

13. Apparatus as defined in claim 12 wherein said identifying meanscomprises means for-selecting that value of the time lag for 'which saidautocorrelation as determined 1s a maximum.

14. Apparatus as defined in claim 11 wherein said comparing meanscomprises means -for multiplying the deeSt speech Wave period, means forcomparing each of lsaid replicas with the undelayed wave, means foridentifying the replica which most closely matches the orig- 20 inalwave as determined by said comparison, means for rejecting all others ofsaid replicas, whereby the delay characterizing the replica thusidentified at each moment is equal to the length of said fundamentalperiod at that moment, means for continually altering saididentification to presreve said closest match as said fundamental periodchanges, and means for developing a signal continuously representativeof the delay characterizing the replica momentarily identified.

References Cited in the lile of this patent UNITED STATES PATENTS2,098,956 Dudley Nov. 16, 1937 2,243,526 Dudley May 27, 1941 2,401,405Bedford June 4, 1946 2,508,620 Peterson May 23, 1950 2,580,421 GuanellaJan. 1, 1952 2,705,742

Miller Apr. 5, 1955

